Abstract

In humans, learning depends on the joint contribution of multiple interacting systems — memory (WM), long-term memory (LTM) and reinforcement learning (RL). The present study aims to understand the relative contributions of these systems during learning as well the specific strategies individuals might rely on. Collins (2018) put forward a working memory-reinforcement learning combined model that addresses this question but it largely ignores long-term memory. We built four ACT-R (single-mechanism RL and LTM, and two integrated RL-LTM, meta-learning RL and parameterized RL bias models) idiographic learning models using the Collins (2018) stimulus-response association task. Different models provided best-fits (LTM: 63%, RL: 1%, meta-RL: 12%, bias-RL:21% of participants) for individual learners which suggests that irreducible differences in learning and meta-learning strategies exist within individuals. Models predicted learning accuracy and rate, and testing accuracy for subjects in their respective groups.

Objectives

This report describes the four ACT-R models and the learning outcomes produced by the changes in paramters. The report also describes how these models fit behavioral data and details the properties of the best fitting models and parameters. The specific objectives of this project is to test if the RLWM task can be modeled well by a group of pure and combined declarative and RL learning models. After fitting the models to participant data we aim to extract parameters that may explain why and how learning resulted as obsereved. If the parameters describe individual differences in learning would the parameters predict other behavioral data like working memory capacity and reinforcement learning accuracy?

ACT-R Models

Below are the 4 ACT-R models tested. Note that the bolded names appear through-out this document.

  • RL: Pure RL model based on learning of production utility in ACT-R. learning rate (alpha) and softmax temperature are the only 2 parameters

  • LTM: A declarative model that solely depends on starage and retirieval of stimuli, response and outcome in ACT-R’s declarative memory. This model depends on decay rate, retrieval noise and

  • meta_RL: This is a combined RL - LTM model. Information about trials performed by the RL system is shared and stored in LTM (declarative) for use. An isolated (meta) RL system (a set of productions) learns and determines which sub-system, RL or LTM, is used throughout learning. Which subsystem is preferred depends on the specific set of parameters.

  • biased: This is a combined RL-LTM model. Information about trials performed by the RL system is not shared with the LTM portion of the model. An additional “strategy” parameters specifies a bias towards the RL model at the 20, 40, 60, and 80 percent of learning and test trials.

Approach

The models are fit to behavioral data and the best-fitting model and set of parameters is selected by comparing BIC. The lowest BIC value determines the winning model. To assess the quality of the fit model and parameters RLWM task learning features were compared to the model outcomes. The features of interest are: - Accuracy at the end of learning (accuracy after 12 stimulus presentations) - Accuracy at test - Change in accuracy from end of learning to test - Learning rate - Differences in the learning trajectories of the two set sizes The expectations and outcomes are described below.

Results

Model fits

Of the four models compared, the LTM model fit the most number of participants (54) followed by the biased version of the combined RL-LTM model (18) and the meta-RL combined model in third place (10). The RL only model had only one participant that fit it best (figure 1). This is a slight departure from out expectation that the combined RL-LTM models would fit the majority of participants. As obsereved, this suggests that most learners simply commit to memory the stimullus response associations.

Figure 1.

Figure 1.

Within each group (groups formed by preferred model types) of participants, there is only 1 RL best fitting combination of paramter values for the alpha and softmax parameters. For the most popular model, LTM, that fit (54) participants, surprisingly, there were only 13 best fitting parameter-value sets for the spreading activation, retrieval noise and memory decay rate parameters. The biased model was the most diverse at 17 parameter sets for (18) participants. The meta-RL model closely followed the biased model interms of diversity of parameter-value sets at 8 parameter-value sets for (10) subjects. Figures 2 and 3 show the medians and ranges of the BIC values that determined that the LTM model is the best fitting model even when only comparing BIC values for the set of paramter-values that fit participants best in each category of models.

Figure 2.

Figure 2.

Figure 3.

Figure 3.

How consistent are the fits obsereved above? Given a participants how many of the next best parameter sets are in the same model category?

model mean median sd min max
biased 16.00000 7.5 20.71657 2 70
LTM 29.14815 7.5 43.52115 2 115
metaRL 22.50000 7.5 33.17044 2 107
RL 7.00000 7.0 NA 7 7

subjects X1 X2 X3 model
6209 0.3835607 4.5885937 1.8903284 biased
6231 13.4544158 5.2886433 1.1094847 LTM
6234 25.2794767 1.2106092 0.3310663 LTM
6235 13.4538969 3.9865741 4.8191421 biased
6241 6.9816209 2.6221561 1.8669564 metaRL
15001 15.9719760 14.1763784 0.0770518 LTM
15004 0.7905286 2.9912596 1.4152323 metaRL
15015 4.0968626 1.8076198 2.3458932 metaRL
28307 25.0758619 0.8837227 0.1963839 LTM
29305 2.8349094 9.8399144 0.4102510 biased

Assesments of Model fits

Looking at the learning curves for the four models in Figure 4, the differences in learning rates are apparent as are other features like the separation between the two set sizes. In the plot below each data point is the average accuracy, for that number of stimulus presentations, across all parameter combinations. The LTM and RL models predict that an increase in set-size does not diminish learning rate and accuracy. But this analysis washes out the individual differences that could be captured by the diverse set of parameter combinations.

Figure 4.

Figure 4.

model setSize N accuracy sd se ci
bias s3 12500 0.6945808 0.2914194 0.0026065 0.0051092
bias s6 12500 0.6683000 0.2750870 0.0024605 0.0048229
LTM s3 125 0.9915467 0.0056882 0.0005088 0.0010070
LTM s6 125 0.9887067 0.0099097 0.0008863 0.0017543
metaRL s3 3125 0.8858709 0.0726997 0.0013005 0.0025499
metaRL s6 3125 0.9066187 0.0767423 0.0013728 0.0026917
RL s3 25 0.7668000 0.1684144 0.0336829 0.0695180
RL s6 25 0.7726667 0.1721191 0.0344238 0.0710473

The panels in figure 5 show the mean accuracy for particant behavioral data. The model lines are averages across parameters for that group only. As we are aiming for an individual differences look at these data, collapsing across so much of this variablility is uninformative, as was shown above in figure 4,especially if the differences, once fit to actual behavioral data, indicate large differences in learning outcomes or cogntive faculty diagnostics like working memory capacity. Here, only the best fitting sets of parameter combinations were selected and collapsed. As can be seen in the figure below, the different model types appear to be vastly different and some charateristics of behavioral data have come through, such as the separations of the learning trajectories for the different setsizes in the RL-LTM Biased model fit. It can also be seen that some paramter sets in the LTM model also capture the diffculty associated with increasing set size (solid lines in Fig. 5B). The LTM participants, on average have the highest accuracies for the testing phase in both set sizes but they are nearly indistinguishable from the meta-RL group for accuracy at end of learning. The biased group shows the most separation between the set size 3 and 6 at learningand also lower accuracy at test than LTM. The biased group is negligibly different from the meta-RL group for set size 3 but shows a marked difference at set size 6, closely following the behavioral data.

Figure 5.

Figure 5.

Figure 5.

Figure 5.

There are five outcome measures of interest in the RLWM task: accuracy at the end learning, accuracy at test, learning rate characterized as number of stimulus presentations to reach 85% accuracy and beta estimate for the first 6 trials, the differences in learning of set3 and set 6 and also the level of preserved learning at test for both set-sizes. The following analyses compare the model data with behavioral data.

figure 6

figure 6

figure 6

figure 6

term df sumsq meansq statistic p.value
setSize 1 0.0000641 0.0000641 0.0048209 0.9446873
iteration 1 1.2908057 1.2908057 97.1443094 0.0000000
setSize:iteration 1 0.1081829 0.1081829 8.1416983 0.0046007
Residuals 328 4.3583024 0.0132875 NA NA
#> 
#>  Welch Two Sample t-test
#> 
#> data:  dat4.t.test$s3 and dat4.t.test$s6
#> t = 2.6285, df = 125.39, p-value = 0.009647
#> alternative hypothesis: true difference in means is not equal to 0
#> 95 percent confidence interval:
#>  0.009137611 0.064824906
#> sample estimates:
#> mean of x mean of y 
#> 0.9362450 0.8992637
#> 
#>  Wilcoxon rank sum test with continuity correction
#> 
#> data:  dat4.t.test$s3 and dat4.t.test$s6
#> W = 4025.5, p-value = 0.05762
#> alternative hypothesis: true location shift is not equal to 0
Figure 7.

Figure 7.

Figure 8.

Figure 8.

setSize mean(estimate) median(estimate)
s3 0.1148164 0.1154762
s6 0.0800440 0.0825397
Figure 8.

Figure 8.

estimate estimate1 estimate2 statistic p.value parameter conf.low conf.high method alternative
0.0347724 0.1148164 0.080044 10.14921 0 142.2575 0.0279997 0.0415451 Welch Two Sample t-test two.sided
#> # A tibble: 16 x 5
#> # Groups:   setSize, type [4]
#>    setSize type  model    mean        se
#>    <chr>   <chr> <chr>   <dbl>     <dbl>
#>  1 s3      behav biased 0.107   0.00392 
#>  2 s3      behav LTM    0.118   0.00230 
#>  3 s3      behav metaRL 0.112   0.00565 
#>  4 s3      behav RL     0.117  NA       
#>  5 s3      model biased 0.0941  0.00552 
#>  6 s3      model LTM    0.105   0.000870
#>  7 s3      model metaRL 0.102   0.00412 
#>  8 s3      model RL     0.130  NA       
#>  9 s6      behav biased 0.0603  0.00715 
#> 10 s6      behav LTM    0.0878  0.00303 
#> 11 s6      behav metaRL 0.0753  0.00581 
#> 12 s6      behav RL     0.0667 NA       
#> 13 s6      model biased 0.0615  0.00533 
#> 14 s6      model LTM    0.103   0.00139 
#> 15 s6      model metaRL 0.0973  0.00530 
#> 16 s6      model RL     0.133  NA
model type meanS3 meanS6 mean_diff
biased behav 0.1074074 0.0602734 0.0471340
biased model 0.0941217 0.0615132 0.0326085
LTM behav 0.1177910 0.0877572 0.0300338
LTM model 0.1046173 0.1027672 0.0018501
metaRL behav 0.1119048 0.0753175 0.0365873
metaRL model 0.1019810 0.0972905 0.0046905
RL behav 0.1166667 0.0666667 0.0500000
RL model 0.1300000 0.1327143 -0.0027143
Figure 9.

Figure 9.

Figure 9.

Figure 9.

#> Analysis of Variance Table
#> 
#> Response: learnDiff
#>           Df  Sum Sq  Mean Sq F value   Pr(>F)    
#> model      3 0.28124 0.093746  23.228 7.05e-11 ***
#> Residuals 79 0.31884 0.004036                     
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
statistic p.value parameter method
31.84158 1e-07 2 Kruskal-Wallis rank sum test
group1 group2 p.value
LTM biased 0.0000000
metaRL biased 0.0000000
metaRL LTM 0.4367617
Figure 10.

Figure 10.

It is difficult to assess what the model fits are capturing without examining the specific paramter sets more carefully or deducing if membership in a particular model group predicts some other cognitve or learning aspects of the subjects. First, for the cohort of subjects

Parameters

Parameter spread

Parameter summary: what is the spread of the parameters across participants in the models?
Figure 11.

Figure 11.

Individual parameter effects on outcomes

Figure 12

Figure 12

Figure 12

Figure 12

Combined effect of parameters on outcomes

These plots show that in the biased model, most of the subjects are at very low percentage of RL use. But also, higher rates of RL use or, more even split between RL and LTM indicates a separation between s3 and s6 learning accuracy.

If that is the case, is the inclusion of the RL component a vital part of their learning make-up, however small it is? This plot shows what this group would have looked like if they relied only on LTM.

How about some K-means clustering?

Some specific plans are to estimate the three LTM parameters for all 83 participants and see if they are related to WM, PSS measures. Also, how are the parameters related to the “separation” between s3 and s6?

Some more specific things to test might be effect of delay between stimulus presentations. ### What are the differences in learning type interms of behavioral outcomes in other tasks?

These plots show group effects for uCLIMB subjects only in python and OLCTS measures and behavioral predictors.

We have 3Back and PSS for a large majority of participants - what are the group differences if any in these outcomes based on model fit?

Chantel’s request: combine language and programming measures and compare groups.

EEG Beta analysis

Individual plots: